我们提出了一个深层神经网络,用于从不受约束的肖像图像中删除不良阴影特征,从而恢复基础纹理。我们的培训计划纳入了三种正则化策略:蒙面损失,以强调高频阴影特征;软阴影损失,改善了对照明微妙变化的敏感性;和阴影偏移估计,以监督阴影和纹理的分离。与最先进的方法相比,我们的方法表明了质量和概括的改善。我们进一步展示了我们的愉悦方法如何增强光敏的计算机视觉任务任务(例如面部重新放置和语义解析)的性能,从而使它们能够处理极端的照明条件。
translated by 谷歌翻译
我们向连续状态马尔可夫决策过程(MDP)提出了一种扩散近似方法,该方法可用于解决非结构化的越野环境中的自主导航和控制。与呈现完全已知的状态转换模型的大多数决策定理计划框架相比,我们设计了一种方法,该方法消除了这种强烈假设,这些假设通常非常难以在现实中工程师。我们首先采用价值函数的二阶泰勒扩展。然后通过部分微分方程近似贝尔曼的最优性方程,其仅依赖于转换模型的第一和第二矩。通过组合价值函数的内核表示,然后设计一种有效的策略迭代算法,其策略评估步骤可以表示为特征的方程式的线性系统,其特征是由有限组支持状态。我们首先通过大量的仿真以2D美元的$ 2D $避让和2.5d $地形导航问题进行验证。结果表明,拟议的方法在几个基线上导致了卓越的性能。然后,我们开发一个系统,该系统将我们的决策框架整合,与船上感知,并在杂乱的室内和非结构化的户外环境中进行现实世界的实验。物理系统的结果进一步展示了我们在挑战现实世界环境中的方法的适用性。
translated by 谷歌翻译
大多数现有的神经体系结构搜索(NAS)基准和算法优先考虑了良好的任务,例如CIFAR或Imagenet上的图像分类。这使得在更多样化的领域的NAS方法的表现知之甚少。在本文中,我们提出了NAS-Bench-360,这是一套基准套件,用于评估超出建筑搜索传统研究的域的方法,并使用它来解决以下问题:最先进的NAS方法在多样化的任务?为了构建基准测试,我们策划了十个任务,这些任务涵盖了各种应用程序域,数据集大小,问题维度和学习目标。小心地选择每个任务与现代CNN的搜索方法互操作,同时可能与其原始开发领域相距遥远。为了加快NAS研究的成本,对于其中两个任务,我们发布了包括标准CNN搜索空间的15,625个体系结构的预定性能。在实验上,我们表明需要对NAS BENCH-360进行更强大的NAS评估,从而表明几种现代NAS程序在这十个任务中执行不一致,并且有许多灾难性差的结果。我们还展示了NAS Bench-360及其相关的预算结果将如何通过测试NAS文献中最近推广的一些假设来实现未来的科学发现。 NAS-Bench-360托管在https://nb360.ml.cmu.edu上。
translated by 谷歌翻译
我们提出和研究内核偶联梯度方法(KCGM),并在可分离的希尔伯特空间上进行最小二乘回归的随机投影。考虑两种类型的随机草图和nyStr \“ {o} m子采样产生的随机投影,我们在适当的停止规则下证明了有关算法的规范变体的最佳统计结果。尤其是我们的结果表明,如果投影维度显示了投影维度与问题的有效维度成正比,带有随机草图的KCGM可以最佳地概括,同时获得计算优势。作为推论,我们在良好条件方面的经典KCGM得出了最佳的经典KCGM,因为目标函数可能不会不会在假设空间中。
translated by 谷歌翻译
在本文中,我们研究了可分离的希尔伯特空间的回归问题,并涵盖了繁殖核希尔伯特空间的非参数回归。我们研究了一类光谱/正则化算法,包括脊回归,主成分回归和梯度方法。我们证明了最佳,高概率的收敛性在研究算法的规范变体方面,考虑到对假设空间的能力假设以及目标函数的一般源条件。因此,我们以最佳速率获得了几乎确定的收敛结果。我们的结果改善并推广了先前的结果,以填补了无法实现的情况的理论差距。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译